1 Introduction

Project: Part I requires CVA Pricing, Option Pricing and Backtesting to be performed in R. For each topic, the questions require us to:

  1. create and implement formula to derive distributions and theryby calculate CVA price

  2. compare the methods of Monte Carlo and closed form solution for pricing barrier options

  3. conduct backtesting for IBM historical data for the last 10 years and estimate historical volatility.

The report attempts to answer the questions in the order. Each section presents the relevant R code used in the analysis to answer the question and explanation.

2 CVA pricing

2.1 Part i

Write your own Black Scholes function and price [= CBS] a simple call option with the following inputs:

#### Question 1 (i) #####
#### Black-scholes equation

#inputs for BS formula

S0 <- 10  #starting equity price
strike <-12    # strike price
vola <- 0.3  #implied volatility
drift<- 0.01 #the risk free interest rate
T<- 5    #maturity of option in years 

#the BS formula manually = ans = 2.157715

d1 <- 1/(vola*sqrt(T))*(log(S0/strike)+(drift+0.5*vola^2)*(T))
d2 <- d1 - vola*sqrt(T)

price = pnorm(d1,0,1)*S0 - pnorm(d2)*strike*exp(-drift*T)

# Price with Black Scholes formula (load the fOptions R standard package)

price_BSbuil_in_function<-GBSOption(TypeFlag = "c",S = S0,X = strike,Time = T,r = drift, b = drift,sigma = vola)@price

print(paste0("The option price by Black Scholes is ", price))
## [1] "The option price by Black Scholes is 2.1577149922667"

To check my Black Scholes function works correctly, the built-in Black Scholes function from fOption packages was used. The price from my Own Black Scholes function Cbs is the same as the price from built in function:2.157716

2.2 Part ii

Implement a Geometric Brownian Motion (GBM) for the Equity underlying process

#### Question 1 (ii) #####
#### Monte Carlo equation

# Step in 10days with 260 day-count convention
tStep<-10/260
# Scenarios
nPaths<-10000
#Evaluation times in units of year  
time<-seq(0,T,tStep) 

set.seed(12345)
# Parameters (S0, vola, drift) 
#S0 = start value

#start matrix
RF_GBM<-matrix(0,nrow=nPaths,ncol=length(time))

#We generate random numbers N(0,1)
pass_rand<-rnorm(nPaths*(length(time)-1)) #generate 10k numbers
pass_rand_first<-rnorm(nPaths*(length(time)-1)) #second set for part (v)
pass_rand<-matrix(pass_rand,nrow=nPaths,ncol=(length(time)-1)) #distribute it to matrix
pass_rand<-scale(pass_rand,center = TRUE,scale = TRUE)

#Adding the starting value
pass_rand<-cbind(rep(0,nPaths),pass_rand)
#Accrueting the innovations
pass_rand<-t(apply(pass_rand,1,cumsum))
#drift_matrix: x=x0*exp(mu*t-vola^2*t/2)*exp(vola*N(0,1)*sqrt(t))
pass_drift<-matrix(rep(exp(-0.5*time*vola^2+time*drift),nPaths),nrow=nPaths,ncol=length(time),
                   byrow=TRUE)
#Generation of the paths (scaling the random component by tStep)
RF_GBM<-S0*exp(pass_rand*vola*sqrt(tStep))*pass_drift
rownames(RF_GBM)<-paste("history",as.character(seq(1,nPaths,1)),sep="")
colnames(RF_GBM)<-paste(as.character(time),"y",sep="")  

par(mfrow=c(1,1))
matplot(time,t(RF_GBM)[,1:100],type="l",xlab="t(y)",ylab="RF",main="GBM process")
points(time,apply(t(RF_GBM),1,mean),type="l",col=1,lwd=4)
points(time,apply(t(RF_GBM),1,quantile,0.05),type="l",col=2,lwd=4)
points(time,apply(t(RF_GBM),1,quantile,0.95),type="l",col=2,lwd=4)
grid()

#find the column names where expiry is 5 and glue expiry with y=years.
expiry<-5 
ind<-which(colnames(RF_GBM)==paste0(expiry,"y"))
K<-12
ST<-RF_GBM[,ind] #10thousand values for ending paths
plot(ST,pmax(0,ST-K))

#discount all future payoffs(slide 22)
discount<-exp(-drift*T)*pmax(0,ST-K)
#Calculate the mean of discounted payoff distribution
price_MC<-mean(discount)
print(paste0("The option price by Black Scholes is ", price_MC))
## [1] "The option price by Black Scholes is 2.19082518005935"

There is slight difference between Cmc and Cbs but Price via Monte Carlo (Cmc:2.190825) is close to price via Black and Scholes (Cbs=2.157716).

2.3 Part iii

Use your Black Scholes function and the paths derived in the previous step to derive MtM (Mark- to-Market) distributions for the call option over time.

###############################################################################
# define initial/valuation input
S0<-10
s<-0.3
mu<-0
T<-5
r<-0.01

#each time units d/dt = #Evaluation times in units of year 
dt<-10/260  #to get nSteps = 10, 130 is chosen. 260 is the number of business days
nSteps<-T/dt  # 5/(130/260) = 10

###############################################################################
#using my own code of RF_GBM
#need to change RF_GBM, which is discounted steps to expected future payoffs
#RF_GBM is the transpose of S, and also reverse the columns. 
#because MtM looks at time to maturity, and RF_GBM is time away from today. 

#transform RF_GBM into the right form. yes, now it is starting from today to end of path.
RF_GBMT <- t(RF_GBM)

C<-RF_GBMT*0   ## S? or RF_GBMT
for (i in 1:nrow(RF_GBMT)){
  C[i,]<-GBSOption(TypeFlag="c",S=RF_GBMT[i,],X=strike,Time=time[i],r=r,b=r,sigma=vola)@price
}
C0<-exp(-r*T)*mean(pmax(RF_GBMT[nrow(RF_GBMT),] - strike,0))   #RF_GBMT


# plot me results
defaultT<-seq(0,T,dt)
par(mfrow=c(1,1))
matplot(defaultT,C[,1:100],type="l",ylab="Call Option - Black Scholes, price [$]",
        xlab="Default Dates [y]",main="Closed Form Payoffs for Call Option with K=12 ",ylim=c(min(C),max(C)))

2.4 Part iv

Use the MtM distributions from the previous step [a matrix (13110000) is expected] and calcu- late the Expected Exposure (EE), the Potential Future Exposure (PFE) at 95% confidence level and the CVA price using the following counterparty spread curve:

# Calculate EE, PFE, CVA

#MTM time is time away from today, not time to maturity
tStep<-seq(0,T,dt)
MtM <-C

EE<-apply(pmax(MtM,0),1, mean)
PFE<-apply(pmax(MtM,0),1,quantile,0.95)
par(mfrow=c(1,2))
plot(tStep,EE,type="l", main = "Expected Exposure")
plot(tStep,PFE,type="l", main = "Potential Future Exposure")

# here the CDS curve info
cdsT<-seq(1,10,1)
cds<-c(92,104,112,117,120,122,124, 125, 126, 127)
par(mfrow=c(1,1))
plot(cdsT,cds,type="l", main = "10yrCredit Default Swap Spread (Basis Points)")

t<-tStep
r<-0.01
DF<-exp(-r*t)
LGD<-0.4
# Basel formula
CDS_curve<-spline(cdsT, cds, xout=t)
temp<-c()
for (i in 2:length(t)){
  temp<-c(temp,max(0,exp(-(CDS_curve$y[i-1]/10000*CDS_curve$x[i-1])/LGD)-
                     exp(-(CDS_curve$y[i]/10000*CDS_curve$x[i])/LGD))*
            (EE[i-1]*DF[i-1]+EE[i]*DF[i])*0.5)
}

CVABasel<-LGD*sum(temp)

print(paste("The CVA (using Basel formula) for the uncollateralized trade equals ",CVABasel))
## [1] "The CVA (using Basel formula) for the uncollateralized trade equals  0.117508183572168"
#0.121307909656681

2.5 Part v

Now add a second option trade in the portfolio.

#(v) new option!

tStep<-10/260
nPaths<-10000
time<-seq(0,T,tStep) 

# pass_rand_first is the first dw used for the first option.

#to find the second dw2
rho <-0.7
correlatedValue = function(x, r){
  r2 = r**2
  ve = 1-r2
  SD = sqrt(ve)
  e  = rnorm(length(x), mean=0, sd=SD)
  y  = r*x + e
  return(y)
}
x = pass_rand_first
pass_rand_two = correlatedValue(x=x, r=rho)
pass_rand_two<-matrix(pass_rand_two,nrow=nPaths,ncol=(length(time)-1)) #distribute it to matrix
pass_rand_two<-scale(pass_rand_two,center = TRUE,scale = TRUE)
pass_rand_two<-cbind(rep(0,nPaths),pass_rand_two)
pass_rand_two<-t(apply(pass_rand_two,1,cumsum))
pass_drift<-matrix(rep(exp(-0.5*time*vola^2+time*drift),nPaths),nrow=nPaths,ncol=length(time),
                   byrow=TRUE)
RF_GBM2<-S0*exp(pass_rand_two*vola*sqrt(tStep))*pass_drift
rownames(RF_GBM2)<-paste("history",as.character(seq(1,nPaths,1)),sep="")
colnames(RF_GBM2)<-paste(as.character(time),"y",sep="")  

expiry<-5 
ind<-which(colnames(RF_GBM2)==paste0(expiry,"y"))
strike2<-14
ST<-RF_GBM2[,ind] #10thousand values for ending paths
plot(ST,pmax(0,ST-strike2), main="Generated Random Paths for Payoffs with K=14")

#calculate MtM2

RF_GBM2T<-t(RF_GBM2)

C2<-RF_GBM2T*0
for (i in 1:nrow(RF_GBM2T)){
  C2[i,]<-GBSOption(TypeFlag="c",S=RF_GBM2T[i,],X=strike2,Time=time[i],r=r,b=r,sigma=vola)@price
}

defaultT<-seq(0,T,dt)
par(mfrow=c(1,1))
matplot(defaultT,C2[,1:100],type="l",ylab="Call Option - Black Scholes, price [$]",
        xlab="Default Dates [y]",main="Closed Form Payoffs for Call Option K=14",ylim=c(min(C),max(C)))

MtM3 <-C+C2

#EE, PFE and CVA
tStep<-seq(0,T,dt)

EE<-apply(pmax(MtM3,0),1, mean)
PFE<-apply(pmax(MtM3,0),1,quantile,0.95)
par(mfrow=c(1,2))
plot(tStep,EE,type="l", main= "Expected Exposure")
plot(tStep,PFE,type="l" , main = "Potential Future Exposure")

temp<-c()
for (i in 2:length(t)){
  temp<-c(temp,max(0,exp(-(CDS_curve$y[i-1]/10000*CDS_curve$x[i-1])/LGD)-
                     exp(-(CDS_curve$y[i]/10000*CDS_curve$x[i])/LGD))*
            (EE[i-1]*DF[i-1]+EE[i]*DF[i])*0.5)
}

CVABasel<-LGD*sum(temp)
print(paste("The CVA (using Basel formula) for the uncollateralized trade equals ",CVABasel))
## [1] "The CVA (using Basel formula) for the uncollateralized trade equals  0.209649743277873"

3 Option pricing

3.1 Definition of Barrier Options

Barrier options are path dependent options, with payoffs that depend on the price of the underlying asset at expiration and whether or not the asset price crosses a barrier during the life of the option [1]. There are two categories or types of Barrier options: “knock-in” and “knock-out”. “Knock-in” or “in” options are paid for up front, but you do not receive the option until the asset price crosses the barrier. “Knock-out” or “out” options come into existence on the issue date but becomes worthless if the asset price hits the barrier before the expiration date. If the option is a knock-in (knock-out), a predetermined cash rebate may be paid at expiration if the option has not been knocked in (knocked-out) during its lifetime. [1]

According to when options can be exercised, they are classified into mainly three groups; European, Bermudan and American Options. American options can be exercised at any time up to the expiration date, whereas European options can be exercised only on the expiration date itself. Bermudan options are similar in style to American options regarding the possibility of early exercise. However, the difference is the fact that a Bermudan option has predetermined discrete exercise dates [2].

Down-and-in call option

include_graphics("./Siang/downin.png")
Figure 1: Down-and-in Barrier Options [3]

Figure 1: Down-and-in Barrier Options [3]

Down-and-out call option

include_graphics("./Siang/downout.png")
Figure 2: Down-and-out Barrier Options [3]

Figure 2: Down-and-out Barrier Options [3]

In-out parity

European vanilla option = European knock-out option + European knock-in option [3]

  • Suppose an investor holds both a knock-out option and a knock-in option. If the stock price never touches the barrier, the knock-out option will provide the same payoff as that of a vanilla option at maturity. When the stock price touches the barrier, the knock-out option becomes worthless and the knock-in option are enabled and providing the same payoff as that of a vanilla option at maturity.

  • Since the combined payoff of a knock-out option and a knock-in option are the same as that of a vanilla option at maturity, their values today should be equal. This result is termed as “in-out parity.”

  • Alternative method to price knock-in options: Since it needs more computational effort to price a knock-in option, it is possible to price a knock-out option with the same barrier first, and then apply the in-out parity to derive the value of the knock-in option.

3.2 Method 1 - Monte Carlo Method

3.2.1 Analysis

The procedures of applying monte carlo method on barrier options pricing is summarised below:

  • Step 1 - Simulate random paths with GBM process for the spot value until maturity T.
  • Step 2 - Decide whether each path is knocked in or out and thus determine the final payoff of each path.
  • Step 3 - Calculate the payoff at maturity.
  • Step 4 - Discount all future payoff distribution of values to t0.
  • Step 5 - Calculate the mean of discounted payoff distribution.
#########################################
# Step 0 - Define initial/valuation input
#########################################
#Set seed for random numbers
set.seed(12345)
# Simulation period 
T<-0.5
# Step in y with 360 day-count convention
tStep<-1/(360*2)
# Scenarios
nPaths<-10000
#Evaluation times in units of year  
time<-seq(0,T,tStep)

# Parameters (S0, vola, drift)  
start_value<-100
# yearly vola
vola<-0.25
# Annualized rate of interest
r <-0.01
# Annualized cost-of-carry rate
b <-0.01
# Barrier Value
H <- 75
# yearly drift
drift<-r
# Strike price
strike<-90


#######################################################################################
# Step 1 - Simulate random paths with GBM process for the spot value until maturity T.
#######################################################################################
#Definition of the RF box   
RF_GBM<-matrix(0,nrow=nPaths,ncol=length(time))

#We generate random numbers N(0,1)
#Always better to generate the innovations in a vectorial fashion and then "distribute" them
pass_rand<-rnorm(nPaths*(length(time)-1))
pass_rand<-matrix(pass_rand,nrow=nPaths,ncol=(length(time)-1))
pass_rand<-scale(pass_rand,center = TRUE,scale = TRUE)
#Adding the starting value
pass_rand<-cbind(rep(0,nPaths),pass_rand)
#Accrueting the innovations
pass_rand<-t(apply(pass_rand,1,cumsum))
#drift_matrix: x=x0*exp(mu*t-vola^2*t/2)*exp(vola*N(0,1)*sqrt(t))
pass_drift<-matrix(rep(exp(-0.5*time*vola^2+time*drift),nPaths),nrow=nPaths,ncol=length(time),
                   byrow=TRUE)
#Generation of the paths (scaling the random component by tStep)
RF_GBM<-start_value*exp(pass_rand*vola*sqrt(tStep))*pass_drift
rownames(RF_GBM)<-paste("history",as.character(seq(1,nPaths,1)),sep="")
colnames(RF_GBM)<-paste(as.character(time),"y",sep="")  

par(mfrow=c(1,1))
matplot(time,t(RF_GBM)[,1:100],type="l",xlab="t(y)",ylab="RF",main="GBM process")
points(time,apply(t(RF_GBM),1,mean),type="l",col=1,lwd=4)
points(time,apply(t(RF_GBM),1,quantile,0.05),type="l",col=2,lwd=4)
points(time,apply(t(RF_GBM),1,quantile,0.95),type="l",col=2,lwd=4)
grid()

#########################################################
# Step 2 - Decide whether each path is knocked in or out 
# and thus determine the final payoff of each path
#########################################################
#Vanilla Opiton
RF_GBM_vanilla<-RF_GBM

#Exotic Option - down-and-in
# Any prices fall below 75 before the maturity will trigger knocking-in of option
# Once it is in, it's in for good.
RF_GBM_di<-RF_GBM[,ncol(RF_GBM)]*(apply(RF_GBM,1,min) < H)

# Exotic Option - down-and-out
# Any prices fall below 75 before the maturity will trigger knocking-out of option
# Once it is out, it's out for good
RF_GBM_do<-RF_GBM[,ncol(RF_GBM)]*(apply(RF_GBM,1,min) >= H)


###################################################################
# Step 3 - calculate the payoff at maturity
###################################################################
#Vanilla Opiton
payoff_vanilla <- pmax(RF_GBM_vanilla[,ncol(RF_GBM_vanilla)] - strike,0)

#Vanilla Opiton
payoff_di<-pmax(RF_GBM[,ncol(RF_GBM)] - strike,0) * (apply(RF_GBM,1,min) < H) 

#Vanilla Opiton
payoff_do<-pmax(RF_GBM[,ncol(RF_GBM)] - strike,0) * (apply(RF_GBM,1,min) >= H) 

###################################################################
# Step 4 - Discount all future payoff distribution of values to t0.
###################################################################
#Vanilla Opiton
discounted_payoff_vanilla <- exp(-r*T)*payoff_vanilla

#Exotic Option - down-and-in
discounted_payoff_di <- exp(-r*T)*payoff_di

#Exotic Option - down-and-out
discounted_payoff_do <- exp(-r*T)*payoff_do

################################################################
# Step 5 - Calculate the mean of discounted payoff distribution.
################################################################
#Vanilla Opiton
price_vanilla <- mean(discounted_payoff_vanilla)

#Exotic Option - down-and-in
price_di <- mean(discounted_payoff_di)

#Exotic Option - down-and-out
price_do <- mean(discounted_payoff_do)

3.2.2 Results

The payoffs at maturity and prices of options are shown below.

# Check: vanilla option price = knock-out option price + knock-in option price 
#Vanilla Option
plot(RF_GBM_vanilla[,ncol(RF_GBM_vanilla)],payoff_vanilla,pch=19,main="Payoff of Vanilla Option", xlab="Price at Maturity", ylab="Payoff")
abline(v=start_value,col=4)

print(paste0("The price of vanilla option is ", price_vanilla))
## [1] "The price of vanilla option is 13.2164280759335"
#Down-and-in Option   #here is the prolem, RF_GBM_di has been removed
plot(RF_GBM_di,payoff_di,pch=19,main="Payoff of Down-and-in Option", xlab="Price at Maturity", ylab="Payoff")
abline(v=start_value,col=4)

print(paste0("The price of exotic down-and-in option is ", price_di))
## [1] "The price of exotic down-and-in option is 0.0286069347124088"
#Down-and-out Option
plot(RF_GBM_do,payoff_do,pch=19,main="Payoff of Down-and-out Option", xlab="Price at Maturity", ylab="Payoff")
abline(v=start_value,col=4)

print(paste0("The price of exotic down-and-out option is ", price_do))
## [1] "The price of exotic down-and-out option is 13.1878211412211"

The sum of prices of down-and-in and down-and-out options doesn’t seem to agree with the price of vanilla option. The price of down-and-in option is almost zero, which makes total sense since the strike price is 90 and current price is 100, if the contract knocks in when it is below 75 and you buy the asset at the strike price of 90, investors are losing the money for sure.

3.3 Method 2 - R package fExoticOptions

The definitions of the variables required in the StandardBarrierOption function are explained below [1].

  • S is the asset price, a numeric value

  • X is the exercise price, a numeric value

  • H is the barrier value, a numeric value

  • Time is the time to maturity measured in years, a numeric value; e.g. 0.5 means 6 months

  • r is the annualized rate of interest, a numeric value; e.g. 0.25 means 25% pa

  • b is the the annualized cost-of-carry rate, a numeric value; e.g. 0.1 means 10% pa.

  • sigma is the annualized volatility of the underlying security, a numeric value; e.g. 0.3 means 30% volatility pa

  • K for an “In”-Barrier a prespecified cash rebate which is paid out at option expiration if the option has not been knocked in during its lifetime,for an “Out”-Barrier a prespecified cash rebate which is paid out at option expiration if the option has not been knocked out before its lifetime, a numerical value

3.3.1 Vanilla option

GBSOption(TypeFlag="c",S=100,X=90,Time=0.5,r=0.01,b=0.01,sigma=0.25)
## 
## Title:
##  Black Scholes Option Valuation 
## 
## Call:
##  GBSOption(TypeFlag = "c", S = 100, X = 90, Time = 0.5, r = 0.01, 
##      b = 0.01, sigma = 0.25)
## 
## Parameters:
##           Value:
##  TypeFlag c     
##  S        100   
##  X        90    
##  Time     0.5   
##  r        0.01  
##  b        0.01  
##  sigma    0.25  
## 
## Option Price:
##  13.15496 
## 
## Description:
##  Thu Jun  9 21:58:05 2016

The price of the vanilla call option is 13.15496 where a prespecified cash rebate is assumed to be 0.

3.3.2 Down-and-in call option

StandardBarrierOption(TypeFlag = "cdi", S = 100, X = 90,H = 75, K = 0, Time = 0.5, r = 0.01, b = 0.01, sigma = 0.25)
## 
## Title:
##  Standard Barrier Option 
## 
## Call:
##  StandardBarrierOption(TypeFlag = "cdi", S = 100, X = 90, H = 75, 
##      K = 0, Time = 0.5, r = 0.01, b = 0.01, sigma = 0.25)
## 
## Parameters:
##           Value:
##  TypeFlag cdi   
##  S        100   
##  X        90    
##  H        75    
##  K        0     
##  Time     0.5   
##  r        0.01  
##  b        0.01  
##  sigma    0.25  
## 
## Option Price:
##  0.02019197 
## 
## Description:
##  Thu Jun  9 21:58:05 2016

The price of the down-and-in call option is 0.0202 where a prespecified cash rebate is assumed to be 0.

3.3.3 Down-and-out call option

StandardBarrierOption(TypeFlag = "cdo", S = 100, X = 90,H = 75, K = 0, Time = 0.5, r=0.01, b = 0.01, sigma = 0.25)
## 
## Title:
##  Standard Barrier Option 
## 
## Call:
##  StandardBarrierOption(TypeFlag = "cdo", S = 100, X = 90, H = 75, 
##      K = 0, Time = 0.5, r = 0.01, b = 0.01, sigma = 0.25)
## 
## Parameters:
##           Value:
##  TypeFlag cdo   
##  S        100   
##  X        90    
##  H        75    
##  K        0     
##  Time     0.5   
##  r        0.01  
##  b        0.01  
##  sigma    0.25  
## 
## Option Price:
##  13.13477 
## 
## Description:
##  Thu Jun  9 21:58:05 2016

The price of the down-and-out call option is 13.13477 where a prespecified cash rebate is assumed to be 0.

3.4 Evaluation of methods

Question: StandardBarrierOption assumes a continuous monitoring of the barrier instead of the discrete one (i.e. the time grid) of your Monte Carlo. Does this affect your Monte Carlo price? Is the difference between the Monte Carlo prices and the closed form solution affected by both Monte Carlo noise and the type of the barrier monitoring? Please comment.

The monte carlo methods monitor the barrier at half-day basis in our case while the standard barrier option is based on continuous monitoring. The frequency/sequence used to generate the random path for Monte Carlo simulation will affect the Monte Carlo price. For example, discrete monitoring compares the price and barrier at specific time. It fails to recognise P less than H between to monitoring points for down-out options. This is not the case for the continuous monitoring. The difference of monitoring will affect the payoff of barrier option (continuous monitoring is more likely to have worthless payoff). The price of two methods will only coverage when the frequency of monte carlo is high enough, but with high frequency, the dimension would be too large to compute. We do a half-day monitoring in our case giving a 361 x 10000 matrix; if monitor hourly, we are going to deal with a 4321 x 10000 matrix; if by minute or seconds, computational complexity of trying to make discrete monitoring as close to continuous could be very high. Therefore there is some error coming from the fact that your discretize your monitoring process vs a truly continuous monitoring. Based on Broadie et al, it is possible to use the continuous barrier formula to price the discrete barrier option with high accuracy by shifting barrier away from the. underlying. In terms of Monte Carlo noises, although, multiple random paths may yield slight different results, but since we are going to aggregate the results from different paths by taking average, the noises becomes a non-factor with large simulation by the rule of large samples. Therefore, monte Carlo noise is not a major factor of the difference in prices from two methods if the sample size generated by Monte Carlo is sufficient large.

4 Backtesting

4.1 Part 1 - Calculate different volatility estimation

In this part, the volatility is estimated with 22 days, 520 days, EWMA 44 days and GK 66days estimation respectively. The volatility estimation over 10 years time is plotted below.

4.1.1 i.),ii.),iii.),iv.)

##################################################################################
# Calculate volatilities for IBM
##################################################################################

# Load the data
library(quantmod)
options("getSymbols.warning4.0"=FALSE)
getSymbols("IBM",src="yahoo", from = "2006-06-06")
## [1] "IBM"
barChart(IBM)

dates<-as.Date(as.character(rownames(data.frame(IBM)),format="%Y%m%d"))
S<-IBM$IBM.Close

# Calculate volatility with different methods
ret<-diff(log(S))

# Calculate the EWMA volatility
ewmaVola<-function(rets,lambda=0.96){
  weights<- (1-lambda)*lambda^seq(length(rets)-1,0,-1)
  sqrt(sum(weights * (rets-mean(rets))^2))*sqrt(260)
}

# Incorporate information from open, close, high, low
ohlc<-IBM[,c("IBM.Open","IBM.High","IBM.Low","IBM.Close")]
ohlc<-IBM["2006-06-06/2016-06-06"]
N<-260 
n<-66 #90 days window


# Estimate volatility based on the Garman-Klass volatility estimator
volGK <- sqrt( N/n * runSum( .5 * log(ohlc[,2]/ohlc[,3])^2 -
                               (2*log(2)-1) * log(ohlc[,4]/ohlc[,1])^2 , n ) )

plot.xts(rollapply(ret,width = 22,function(x) sd(x))*sqrt(260), type='n',
         main="IBM - volatility",xlab="Date",ylab="Volatility")
lines(rollapply(ret,width = 22,function(x) sd(x))*sqrt(260),lwd=2) #30 days window
points(rollapply(ret,width = 250,function(x) sd(x)*sqrt(260)),col=2,type="l",lwd=2) #250 days window
points(rollapply(ret[2:length(ret)],width = 44,ewmaVola),col=4,type="l",lwd=2) #EWMA weighted vol with lambda 0.96, window 90
points(volGK,type="l",col=6,lwd=2) #GK vol estimation

legend("topright",col = c(1,2,4,6),lty = rep(1,4),lwd=rep(2,4),
       legend = c(expression(sigma["22d"]),
                  expression(sigma["250d"]),
                  expression(sigma["EWMA"]),
                  expression(sigma["GK"])),bty = "n")

4.2 Part 2 Test for different volatility model

In this part, the one year rolling volatility is used to build the 10000 GBM paths with zero drift. Through PIT, each empirical distribution of volatility is transformed into a comparable distribution to uniform distribution to calculate the CVM for each path. The histogram below shows the distribution of the CVM distance of each transformed distribution from uniform distribution of the steps with corresponding horizon.

4.2.1 Volatility estimation with 22 days window

##################################################################################
# 22days rolling windows Vol estimation for IBM
##################################################################################

#1. To generates GBM for testing 
drift<-0.00
sigma<-mean(na.omit(rollapply(ret,width = 252,function(x) sd(x))*sqrt(260)))
Srandom<-GBM(10,drift,sigma,1/252,nrow(S)-1,nPaths)

for (i in 1:length(h)){
  
  h_step<-h[i]
  backtestStart<-seq(1,(nrow(S)-h_step),step)
  backtestEnd<-seq((h_step+1),nrow(S),step)
  
  uniforms<-matrix(NaN,length(backtestStart),nPaths)
  for (j in 1:length(backtestStart)){
    
    logmean<- log(Srandom[backtestStart[j],]) + (drift -sigma^2)*h_step/260
    logvar<- sigma^2*(h_step/260)
    
    for (k in 1:length(logmean)){
      uniforms[j,k]<-plnorm(Srandom[backtestEnd[j],k], meanlog = logmean[k], sdlog = sqrt(logvar))
    }
    

  }
  assign(paste("uniform",h_step,sep=""),uniforms)
}

# just a list to store the expected uniforms from the Monte Carlo
statistics<-list(uniform22=uniform22,uniform66=uniform66,uniform132=uniform132)

# calculate the expected distances
for (i in 1:length(h)){
  h_step<-h[i]
  statistic<-CVM_test(get(paste("uniform",h_step,sep="")),0.001)
  assign(paste("statistic",h_step,sep=""),statistic)
  rm(statistic)
}

# Plot the expected distances
par(mfrow=c(2,2))
for (i in 1:length(h)){
  h_step<-h[i]
  hist(get(paste("statistic",h_step,sep="")),100,main=paste("Statistic for h=",h_step,"d"),
       xlab = "Statistic")
}



# 2. Calculate the realization for forecasting horizons 


logRet<-diff(log(S))

window<-22
sigma<-S*0
for (i in (window):nrow(logRet)){
  sigma[i,1]<-sd(logRet[(i-window+1):i,1])*sqrt(260)
}
sigma[1:(window),1]<-sigma[window+1]



for (i in 1:length(h)){
  
  h_step<-h[i] 
  backtestStart<-seq(1,(nrow(S)-h_step),step)
  backtestEnd<-seq((h_step+1),nrow(S),step)
  
  uniforms<-matrix(NaN,length(backtestStart),1)
  for (j in 1:length(backtestStart)){
    
    logmean<- log(S[backtestStart[j],1]) + (drift -sigma[backtestStart[j]]^2/2)*h_step/260
    logvar<- sigma[backtestStart[j]]^2*(h_step/260)
    
    uniforms[j,1]<-plnorm(S[backtestEnd[j],1], meanlog = logmean, sdlog = sqrt(logvar))
    

  }
  assign(paste("empiricalUniform",h[i] ,sep=""),uniforms)
}

# Plot the empirical uniform distribution
par(mfrow=c(2,2))

for (i in 1:length(h)){
  h_step<-h[i]
  hist(get(paste("empiricalUniform",h_step,sep="")),100,main=paste("Empirical Uniform Distribution  for h=",h_step,"d"),
       xlab = "Statistic")
}


for (i in 1:length(h)){
  h_step<-h[i]
  realization<-CVM_test(get(paste("empiricalUniform",h_step,sep="")),0.001)
  assign(paste("realization",h[i],sep=""),realization)
  rm(realization)
}

par(mfrow=c(2,2))

for (i in 1:length(h)){
  h_step<-h[i]
  hist(get(paste("statistic",h[i],sep="")),100,main=paste("Backtest result for h=",h_step,"d"),
       xlab = "Statistic")
  quantile99<-sort(get(paste("statistic",h[i],sep="")))[0.99*nPaths]
  abline(v=quantile99,col=2,lwd=4)
  if (get(paste("realization",h[i],sep=""))>=quantile99){color<-2}else{color<-3}
  
  abline(v=get(paste("realization",h[i],sep="")),col=color,lwd=5,lty=2)
}

4.2.2 Volatility estimation with 520 days window

##################################################################################
# 520days rolling windows Vol estimation for IBM
##################################################################################

#1. To generates GBM for testing 
drift<-0.00
sigma<-mean(na.omit(rollapply(ret,width = 252,function(x) sd(x))*sqrt(260)))
Srandom<-GBM(10,drift,sigma,1/252,nrow(S)-1,nPaths)

for (i in 1:length(h)){
  
  h_step<-h[i]
  backtestStart<-seq(1,(nrow(S)-h_step),step)
  backtestEnd<-seq((h_step+1),nrow(S),step)
  
  uniforms<-matrix(NaN,length(backtestStart),nPaths)
  for (j in 1:length(backtestStart)){
    
    logmean<- log(Srandom[backtestStart[j],]) + (drift -sigma^2)*h_step/260
    logvar<- sigma^2*(h_step/260)
    
    for (k in 1:length(logmean)){
      uniforms[j,k]<-plnorm(Srandom[backtestEnd[j],k], meanlog = logmean[k], sdlog = sqrt(logvar))
    }
    

  }
  assign(paste("uniform",h_step,sep=""),uniforms)
}

# just a list to store the expected uniforms from the Monte Carlo
statistics<-list(uniform22=uniform22,uniform66=uniform66,uniform132=uniform132)

# calculate the expected distances
for (i in 1:length(h)){
  h_step<-h[i]
  statistic<-CVM_test(get(paste("uniform",h_step,sep="")),0.001)
  assign(paste("statistic",h_step,sep=""),statistic)
  rm(statistic)
}

# Plot the expected distances
par(mfrow=c(2,2))
for (i in 1:length(h)){
  h_step<-h[i]
  hist(get(paste("statistic",h_step,sep="")),100,main=paste("Statistic for h=",h_step,"d"),
       xlab = "Statistic")
}



# 2. Calculate the realization for forecasting horizons 


logRet<-diff(log(S))

window<-520
sigma<-S*0
for (i in (window):nrow(logRet)){
  sigma[i,1]<-sd(logRet[(i-window+1):i,1])*sqrt(260)
}
sigma[1:(window),1]<-sigma[window+1]



for (i in 1:length(h)){
  
  h_step<-h[i] 
  backtestStart<-seq(1,(nrow(S)-h_step),step)
  backtestEnd<-seq((h_step+1),nrow(S),step)
  
  uniforms<-matrix(NaN,length(backtestStart),1)
  for (j in 1:length(backtestStart)){
    
    logmean<- log(S[backtestStart[j],1]) + (drift -sigma[backtestStart[j]]^2/2)*h_step/260
    logvar<- sigma[backtestStart[j]]^2*(h_step/260)
    
    uniforms[j,1]<-plnorm(S[backtestEnd[j],1], meanlog = logmean, sdlog = sqrt(logvar))
    

  }
  assign(paste("empiricalUniform",h[i] ,sep=""),uniforms)
}

# Plot the empirical uniform distribution
par(mfrow=c(2,2))

for (i in 1:length(h)){
  h_step<-h[i]
  hist(get(paste("empiricalUniform",h_step,sep="")),100,main=paste("Empirical Uniform Distribution  for h=",h_step,"d"),
       xlab = "Statistic")
}

for (i in 1:length(h)){
  h_step<-h[i]
  realization<-CVM_test(get(paste("empiricalUniform",h_step,sep="")),0.001)
  assign(paste("realization",h[i],sep=""),realization)
  rm(realization)
}

par(mfrow=c(2,2))

for (i in 1:length(h)){
  h_step<-h[i]
  hist(get(paste("statistic",h[i],sep="")),100,main=paste("Backtest result for h=",h_step,"d"),
       xlab = "Statistic")
  quantile99<-sort(get(paste("statistic",h[i],sep="")))[0.99*nPaths]
  abline(v=quantile99,col=2,lwd=4)
  if (get(paste("realization",h[i],sep=""))>=quantile99){color<-2}else{color<-3}
  
  abline(v=get(paste("realization",h[i],sep="")),col=color,lwd=5,lty=2)
}

4.2.3 Volatility estimation with EWMA with 44 days window

##################################################################################
# EWMA Vol estimation for IBM
##################################################################################

#1. To generates GBM for testing 
drift<-0.00
sigma<-mean(na.omit(rollapply(ret,width = 252,function(x) sd(x))*sqrt(260)))
Srandom<-GBM(10,drift,sigma,1/252,nrow(S)-1,nPaths)

for (i in 1:length(h)){
  
  h_step<-h[i]
  backtestStart<-seq(1,(nrow(S)-h_step),step)
  backtestEnd<-seq((h_step+1),nrow(S),step)
  
  uniforms<-matrix(NaN,length(backtestStart),nPaths)
  for (j in 1:length(backtestStart)){
    
    logmean<- log(Srandom[backtestStart[j],]) + (drift -sigma^2)*h_step/260
    logvar<- sigma^2*(h_step/260)
    
    for (k in 1:length(logmean)){
      uniforms[j,k]<-plnorm(Srandom[backtestEnd[j],k], meanlog = logmean[k], sdlog = sqrt(logvar))
    }
    

  }
  assign(paste("uniform",h_step,sep=""),uniforms)
}

# just a list to store the expected uniforms from the Monte Carlo
statistics<-list(uniform22=uniform22,uniform66=uniform66,uniform132=uniform132)

# calculate the expected distances
for (i in 1:length(h)){
  h_step<-h[i]
  statistic<-CVM_test(get(paste("uniform",h_step,sep="")),0.001)
  assign(paste("statistic",h_step,sep=""),statistic)
  rm(statistic)
}

# Plot the expected distances
par(mfrow=c(2,2))
for (i in 1:length(h)){
  h_step<-h[i]
  hist(get(paste("statistic",h_step,sep="")),100,main=paste("Statistic for h=",h_step,"d"),
       xlab = "Statistic")
}



# 2. Calculate the realization for forecasting horizons 


logRet<-diff(log(S))

window<-44
sigma<-S*0
for (i in (window+1):nrow(logRet)){
  sigma[i,1]<-ewmaVola(logRet[(i-window+1):i,1])
}
sigma[1:(window),1]<-sigma[window+1]



for (i in 1:length(h)){
  
  h_step<-h[i] 
  backtestStart<-seq(1,(nrow(S)-h_step),step)
  backtestEnd<-seq((h_step+1),nrow(S),step)
  
  uniforms<-matrix(NaN,length(backtestStart),1)
  for (j in 1:length(backtestStart)){
    
    logmean<- log(S[backtestStart[j],1]) + (drift -sigma[backtestStart[j]]^2/2)*h_step/260
    logvar<- sigma[backtestStart[j]]^2*(h_step/260)
    
    uniforms[j,1]<-plnorm(S[backtestEnd[j],1], meanlog = logmean, sdlog = sqrt(logvar))
    

  }
  assign(paste("empiricalUniform",h[i] ,sep=""),uniforms)
}

# Plot the empirical uniform distribution
par(mfrow=c(2,2))

for (i in 1:length(h)){
  h_step<-h[i]
  hist(get(paste("empiricalUniform",h_step,sep="")),100,main=paste("Empirical Uniform Distribution  for h=",h_step,"d"),
       xlab = "Statistic")
}

for (i in 1:length(h)){
  h_step<-h[i]
  realization<-CVM_test(get(paste("empiricalUniform",h_step,sep="")),0.001)
  assign(paste("realization",h[i],sep=""),realization)
  rm(realization)
}

par(mfrow=c(2,2))

for (i in 1:length(h)){
  h_step<-h[i]
  hist(get(paste("statistic",h[i],sep="")),100,main=paste("Backtest result for h=",h_step,"d"),
       xlab = "Statistic")
  quantile99<-sort(get(paste("statistic",h[i],sep="")))[0.99*nPaths]
  abline(v=quantile99,col=2,lwd=4)
  if (get(paste("realization",h[i],sep=""))>=quantile99){color<-2}else{color<-3}
  
  abline(v=get(paste("realization",h[i],sep="")),col=color,lwd=5,lty=2)
}

4.2.4 Volatility estimation with GK with 66 days window

##################################################################################
# GK Vol estimation for IBM
##################################################################################

#1. To generates GBM for testing 

drift<-0.00
sigma<-mean(na.omit(rollapply(ret,width = 252,function(x) sd(x))*sqrt(260)))
Srandom<-GBM(10,drift,sigma,1/252,nrow(S)-1,nPaths)

for (i in 1:length(h)){
  
  h_step<-h[i]
  backtestStart<-seq(1,(nrow(S)-h_step),step)
  backtestEnd<-seq((h_step+1),nrow(S),step)
  
  uniforms<-matrix(NaN,length(backtestStart),nPaths)
  for (j in 1:length(backtestStart)){
    
    logmean<- log(Srandom[backtestStart[j],]) + (drift -sigma^2)*h_step/260
    logvar<- sigma^2*(h_step/260)
    
    for (k in 1:length(logmean)){
      uniforms[j,k]<-plnorm(Srandom[backtestEnd[j],k], meanlog = logmean[k], sdlog = sqrt(logvar))
    }
  }
  assign(paste("uniform",h_step,sep=""),uniforms)
}

# just a list to store the expected uniforms from the Monte Carlo
statistics<-list(uniform22=uniform22,uniform66=uniform66,uniform132=uniform132)

# calculate the expected distances
for (i in 1:length(h)){
  h_step<-h[i]
  statistic<-CVM_test(get(paste("uniform",h_step,sep="")),0.001)
  assign(paste("statistic",h_step,sep=""),statistic)
  rm(statistic)
}

# Plot the expected distances
par(mfrow=c(2,2))
for (i in 1:length(h)){
  h_step<-h[i]
  hist(get(paste("statistic",h_step,sep="")),100,main=paste("Statistic for h=",h_step,"d"),
       xlab = "Statistic")
}



# 2. Calculate the realization for forecasting horizons 


logRet<-diff(log(S))

window<-66

sigma<-S*0
sigma=volGK
sigma[1:(window),1]<-sigma[window+1]



for (i in 1:length(h)){
  
  h_step<-h[i] 
  backtestStart<-seq(1,(nrow(S)-h_step),step)
  backtestEnd<-seq((h_step+1),nrow(S),step)
  
  uniforms<-matrix(NaN,length(backtestStart),1)
  for (j in 1:length(backtestStart)){
    
    logmean<- log(S[backtestStart[j],1]) + (drift -sigma[backtestStart[j]]^2/2)*h_step/260
    logvar<- sigma[backtestStart[j]]^2*(h_step/260)
    
    uniforms[j,1]<-plnorm(S[backtestEnd[j],1], meanlog = logmean, sdlog = sqrt(logvar))
    

  }
  assign(paste("empiricalUniform",h[i] ,sep=""),uniforms)
}

# Plot the empirical uniform distribution
par(mfrow=c(2,2))

for (i in 1:length(h)){
  h_step<-h[i]
  hist(get(paste("empiricalUniform",h_step,sep="")),100,main=paste("Empirical Uniform Distribution  for h=",h_step,"d"),
       xlab = "Statistic")
}

for (i in 1:length(h)){
  h_step<-h[i]
  realization<-CVM_test(get(paste("empiricalUniform",h_step,sep="")),0.001)
  assign(paste("realization",h[i],sep=""),realization)
  rm(realization)
}

par(mfrow=c(2,2))

for (i in 1:length(h)){
  h_step<-h[i]
  hist(get(paste("statistic",h[i],sep="")),100,main=paste("Backtest result for h=",h_step,"d"),
       xlab = "Statistic")
  quantile99<-sort(get(paste("statistic",h[i],sep="")))[0.99*nPaths]
  abline(v=quantile99,col=2,lwd=4)
  if (get(paste("realization",h[i],sep=""))>=quantile99){color<-2}else{color<-3}
  
  abline(v=get(paste("realization",h[i],sep="")),col=color,lwd=5,lty=2)
}

4.3 Part 3 Evaluation of Results

In this part, the results from the tests will be interpreted and a best model for each horizon will be chosen with evaluation and arguments.

4.4 General observation

From the backtesting with GBM, all the realisations of all the volatility models lie within the 99% level.

However, their empirical distributions are generally not quite uniformly distributed with spikes around 0 and 1. This is probably caused by wrong estimation of drift which, in this case, is assumed to be zero. From the time series plot of IBM, there are consistent upward trends and downward trends within certain period which explains why there are high frequency on both ends. The volatility estimates that are based on past volalities will not be able to capture the spikes well, decreasing the accuracy of the estimates. As upward trends occupy a longer time period, the frequency around 0 is found to be higher than the other side.

Also, a dome shaped structure is also obsvered, indicating that the volatility is oftenly over estimated. Such error could be caused by an extreme change of volatility within the 10 years record. A rapid increase of volatility causes an under-estimation of volatility while a rapid decrease of volatility causes an over-estimation of volatility. From the volatility graph of IBM, we can see that there are quite a lot of spikes and a single large spike appeared around the period 2008 - 2009 which explain the over and under estimation of volatility.

As the GBM back testing for all the estimation has similar quality, our evaluation of different model will be mainly based on the PIT of the empirical distribution.

4.4.1 22 Days Horizon

The 22 days horizon gives the most sactisfactory result in all the test. This is probably because of the vigorous changes appeared within 10 years period. The empirical distribution closest to a uniform distribution is the GK volatility estimation model.

4.4.2 66 Days Horizon

For 66 days horizon, the 22days and 520 days roling window estimations gives a rather negatively skewed distribution while EWMA and the GK estimation manage to give a little bit better prediction with the empirical distribution closer to uniform distribution.

4.4.3 132 Days Horizon

For the 132 days horizon, all four estimations methods give negatively skewed empirical distribution with no obvious winner.

5 Reference